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^ Abstract 

^ We study the benefits and limits of parallelised Markov chain Monte Carlo (MCMC) sampling in cosmology. 
Q MCMC methods are widely used for the estimation of cosmological parameters from a given set of observations 

and are typically based on the Metropolis-Hastings algorithm. Some of the required calculations, such as 
Q evaluating the likelihood, can however be computationally intensive, meaning that a single long chain can 

take several hours or days to calculate. In practice, this can be limiting, since the MCMC process needs to 
I — I be performed many times to test the impact of possible systematics and to understand the robustness of the 
O measurements being made. To achieve greater speed through parallelisation, algorithms need to have short 
u auto-correlation times and minimal overheads caused by tuning and burn-in. In order to efficiently distribute 
^ the MCMC sampling over thousands of cores on modern cloud computing infrastructure, we developed a 



I ""Python framework called CosmoHammer which embeds emcee, an implementation by |Foreman-Mackey et al.| 
O ( |2012[ ) of the affine invariant ensemble sampler by [Goodman and Weare ( 2010[ ). We test the performance 
of CosmoHammer for cosmological parameter estimation from cosmic microwave background data. While 
, ^ , Metropolis-Hastings is constrained by overheads, CosmoHammer is able to accelerate the sampling process 
from a wall time of 30 hours on a single machine to 16 minutes by the efficient use of 2048 cores. Such short 
^ wall times for complex data sets opens possibilities for extensive model testing and control of systematics. 

^ Keywords: Markov chain Monte Carlo methods. Cloud computing, Cosmological parameter estimation 
O 



1. Introduction 

CN Bayesian inference is a standard procedure in cos- 
T1 mology when measurement results are compared to 
• ^ predictions of a parameter-dependent model. The 
^ likelihood function £,(6) is defined as the conditional 
^ probability of a measurement outcome x given the 
model parameters are fixed to 9: £.{6) := pix\9). 
Bayesian inference tells us how to update our knowl- 
edge from a prior distribution q(0) to a new poste- 
rior distribution which accounts for the recent 
measurement: 

p"'''\e) oc q{e)p(x\e). 
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For exploring the likelihood function X or the 
posterior distribution of the parameters, Markov 
chain Monte Carlo (MCMC) algorithms are today's 
method of choice when no functional expressions 
are available. Starting with the analysis of cosmic 



microwave background (CMB) data in 2001 ( [Chris 
tensen et al. l [200T|[Knox et aLHIOOTj ), MCMC meth- 
ods turned into a vital tool in the analysis of astro- 
nomical data from all sorts of cosmological probes. 

Most of the analyses in the literature rely on a 
particular instance of a MCMC method, the so-called 



Metropolis-Hastings algorithm from 1970 ( |Metropo 
lis et al.[fl953[|Hastings[|1970[). The Fortran program 



CosmoMC by Lewis and Bridle (20021, for example, is 
a widely used and very successful tool for parameter 
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estimation in the cosmology community, based on the 
Metropolis-Hastings algorithm and the Boltzmann 
integrator CAMB (Code for Anisotropics in the Mi- 
crowave Background by [Lewis et al.| ( |2000] )). A large 
number of scientific projects have used CosmoMC, 
among them prominent observations as the Wilkin- 



son Microwave Anisotropy Probe (WMAP) (Dunkley 



etal. 2009 1 or the Sloan Digital Sky Survey (Tegmark 



etan2004). 



Recently, Foreman-Mackey et al. ( 2012[ ) presented 
emcee, a Python implementation of a novel MCMC 



algorithm by [Goodman and Weare| ( |2010; ) which has 
several potential advantages over Metropolis-Hastings: 
the sampling depends on less tuning-parameters, it 
is affine invariant (i.e. it does not depend on linear 
transformations of the parameters) and parallelisable. 

Creating samples for the estimation of cosmologi- 
cal parameters from CMB measurements, for exam- 
ple, typically takes a couple of hours or even days 
on a desktop computer when using the Metropolis- 
Hastings algorithm. Whenever the sampling process 
has to be repeated a number of times in order to study 
the fit of various distinct models to the data or to find 
out about the systematic s of a measurement, such a 
long run time gets increasingly problematic. We there- 
fore focus on the parallelisability of MCMC methods 
in order to minimise the wall time of the calculation. 

One of the main questions we try to answer is 
how fast one can, in principle, generate a useful sam- 
ple of a given distribution when using emcee on an 
extended computing environment such as a cloud ser- 
vice or grid computer. For this purpose, we com- 
bined the Boltzmann integrator CAMB and the WMAP 
likelihood code and data ( [Larson et al] |2011| |Ko- 
matsu et si] |2011| ) (both written in Fortran90) with 



the emcee sampler by Foreman-Mackey et al. (2012 1 
in a Python framework — called CosmoHammer in the 
following — allowing us to study the example of pa- 
rameter inference from CMB data in detail. We carry 
out a careful analysis of CosmoHammer's sampling 
efliciency, focusing in particular on the implications 
arising from the simultaneous sampling on multiple 
physical computers. 

Taking full advantage of large scale computing 
environments such as the Amazon EC2Q is a non- 



trivial task. We therefore present the Python frame- 
work itself, since CosmoHammer has been designed 
for optimal computational performance and efficiency 
on such environments. The architecture of our code 
makes CosmoHammer easily extendable to other cos- 
mological applications, as it is straightforward to plug 
in Python modules containing further likelihood func- 
tions or codes for theory predictions. 

In addition, parameter estimation with CMB data 
is well documented in the literature and standard 
MCMC tools are publicly available, making it a good 
reference point for the performance of the algorithm 
by Goodman and Weare as compared to a Metropolis- 
Hastings sampler. 

We proceed as follows: In section [2[ the analysis 
of cosmological data using MCMC methods is briefly 
introduced and limitations of the current state-of-the- 
art are discussed. Section [3] explains how we address 
these limitations using the algorithm by [Goodman and 



Weare[ ([2010[) and its implementation by [Foreman" 



|Mackey et al.[ ( [2012] ). Introducing our code in section 
|4} we review its components, explain its architecture 
and outline the parallelisation scheme. We test the 
code in section|5]by sampling the WMAP 7 likelihood 
and comparing the results to MCMC chains from the 
Metropolis-Hastings sampler CosmoMC by Lewis and[ 
Bridle (2002) in an extensive analysis. In section[6] 
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we assess the performance of CosmoHammer on dif- 
ferent cloud computing configurations. We discuss 
the results and conclude in section |7j Finally, we ex- 
plain the installation of the package and give detailed 
examples for running the algorithm in the appendix. 

2. Markov chain Monte Carlo for cosmology 

In the following, we give a brief introduction 
to MCMC methods by discussing the Metropolis- 
Hastings (MH) algorithm as an example. Afterwards, 
we focus on the diflficulties one faces when applying 
those methods to cosmological data. 

2.1. MCMC methods 

Assume we are interested in a probability den- 
sity function p(6) (called target distribution in the 
following) which is not given explicitly but can be 
calculated numerically up to a constant factor. If we 
want to learn about p(6) we need to estimate it from 



a finite number of numerical evaluations. MCMC 
algorithms generate a sample distributed according to 
the target distribution in a probabilistic fashion. We 
illustrate it using the MH algorithm as an example. A 
classic textbook for further information is [MacKay 
( 2003 1, which is also on the web|5 

Imagine the sampling process as a random walk 
in the parameter space of the target distribution. The 
walk has to be initialised at a point and a common 
method is to start at a random position close to the 
region where the likelihood is expected to be centered. 
In order to determine the next position, we need to 
specify a proposal density P which has to be easy 
to sample and is usually chosen to be a Gaussian 
distribution. The probability of randomly proposing 
the new position 6' when being at 9t — the position 
in the sample — is given by P{6';6t). After sampling 6' 
from P, the new position is accepted with probability 



min 1, 



pW')PWAe') 
Pie,) P{e';9r) 



The updated is finally given by 0' if the step is 
accepted or by Of if it is rejected. The set {6t}te{Q, -,T] 
converges to a sample from the target distribution p(6) 
for large T ( [Metropolis et aLj [1953[ [Hastings} [1^0]). 

It is worth noting that the efficiency of the sam- 
pling crucially depends on the chosen proposal distri- 
bution. If P mainly proposes positions in the relevant 
parts of parameter space where the target distribution 
is large, the chain very quickly converges to a sample 
of p{6). Yet, if the proposal distribution tends towards 
positions where p(6) has low probability, most of the 
steps are rejected and samples will be heavily corre- 
lated. Using a proposal which is close to p(6) is a 
convenient choice, as it guarantees that the proposed 
positions and the target are distributed similarly. 

2.2. Application to cosmology 

In cosmology the target distribution often arises 
from Bayesian inference. We consider the example of 
parameter estimation from CMB data for illustration. 
An observation measuring the CMB yields the angu- 
lar power spectrum of the radiation as a result. At 
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the same time, this power spectrum can be predicted 
from theoretical models for the evolution of the uni- 
verse and it takes a few seconds to calculate it using a 
numerical Boltzmann integrator like CAMB. 

The likelihood function X of the parameters in 
the model arises from the conditional probability of 
the measurement result X given the parameters 6 = 
(01, • • • ,64): 

m := p{X\e). 

If we already have information on the parameters in 
the form of a prior distribution q{6) from previous ob- 
servations, X is used to update it according to B ayes' 
rule: 

p"''\e) oc £(e)q(e). 

Both X and the posterior p"'^^' depend on the re- 
sults of the numerical Boltzmann integrator and are 
hence not available as analytical functions of 0. As 
the number of parameters in this problem is usually 
greater than or equal to 6, MCMC methods have to 
be used for estimating £ or p"^*. Although MCMC 
sampling makes the estimation feasible for large di- 
mensions, it still sufl'ers from the fact that calling 
the likelihood function — i.e. running the Boltzmann 
integrator for some specified parameters and compar- 
ing the results to the data in the CMB example — is 
costly in terms of time and resources. Consequently, 
we want our sampler to be as efficient as possible 
in terms of likelihood function calls per converged 
sample. 

Additionally, MH sampling is by definition a se- 
rial process, iterating the calls to the likelihood func- 
tion several thousand times. Using a single MH chain, 
it typically takes at least a few hours before estimat- 
ing the cosmological parameters from the sampled 
distribution yields robust results. 

When inferring the parameters of the concordance 
model of cosmology ACDM, for example, waiting 
for a few hours once is bearable. For general appli- 
cations, though, this will not suffice. Testing how 
well various other models different from ACDM fit 
the data implies that the likelihood has to be explored 
separately for every distinct model. Furthermore, it 
is often necessary to introduce so-called nuisance pa- 
rameters which model the process of data generation. 
Altering the number and the effect of the nuisance 
parameters is a way to test for the systematics in a 
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measurement and requires the MCMC sampling pro- 
cess to be repeated multiple times. Whenever the 
MCMC analyses have to be iterated, it is hence desir- 
able to decrease their run time — either by improving 
on the underlying MCMC process or by running the 
calculations in parallel on a cluster or cloud com- 
puting environment. In the next section we explain 
why the Python sampler emcee is a good choice for 
achieving such a speed-up. 

3. Afflne invariant ensemble sampler 

We learned in section [2] that MCMC sampling can 
be very time consuming in cosmological applications 
when the likelihood function is hard to evaluate nu- 
merically. Here we study how this can be overcome 
when using more recent MCMC algorithms such as 



the one by Goodman and Weare (2010 1 (called GW in 
the following) instead of MH. We therefore introduce 
the algorithm and its implementation by ForemanJ 
[Mackey et al.| (2012l before discussing its advantages 
in terms of efficiency and parallelisation. 

2.1. The ensemble sampler by Goodman and Weare 
Instead of a single position which is updated dur- 
ing the course of the sampling, GW uses an ensemble 
of walkers which are spread on the parameter space of 
the target distribution. At every iteration, the walkers 
are assigned to a partner and a random point on a 
line connecting their positions is proposed as the next 
step. 

More formally, let 6\ denote the position of walker 
/ after t iterations. When updating this position to 
we pick another walker 0/ at random with j 4^ i, 
sample a value z from the fixed distribution 



q(z) = 



1 



if Z 6 



1 



-, a 



otherwise 



with tuning parameter a and propose the position 6' = 
6] + z{(^t ~ ^/)- After evaluating the target distribution 
p at the proposed position, we accept the step if 



with r being a random number from [0, 1] and d the 
dimension of p. 



GW is affine invariant, i.e. invariant under linear 
transformations of the target distribution. This implies 
in particular that the sampler is not sensitive to the 
scales of the parameters and does not depend on the 
covariances of the target distribution. 

3.2. Emcee 

The algorithm by [Goodman and Weare] ( |2010[ ) 
was slightly altered and implemented as the Python 
module emcee by |Foreman-Mackey et al. (2012). In 
this implementation, the algorithm does not update 
the walkers serially but instead divides them into two 
subgroups and updates all of the walkers from one 
subgroup at a time, using the other half of the walkers 
as their references. 



3.3. Advantages 

In section [2] we mentioned that MH needs a pro- 
posal distribution P which is governing the efficiency 
of the algorithm. Thinking of P as a J-dimensional 
Gaussian distribution with unknown covariance ma- 
trix, one finds that the MH has d{d -i- l)/2 tuning 
parameters. When the scales of variances and co- 
variances of the target distribution are well known 
before the sampling, the tuning parameters are very 
helpful: By supplying the MH with an appropriate 
covariance matrix, the parameter space is explored 
efficiently and the sample quickly converges. Yet, if 
the target distribution is unknown and the covariance 
matrix is only a guess, the efficiency of the algorithm 
generally decreases. In case of non-Gaussian target 
distributions even a tuned Gaussian proposal might 
not match the target and efficiency can be low. 

On the other hand, GW is affine invariant and thus 
not sensitive to the scales of the target distribution. 
Hence, the efficiency of the sampling process does 
not depend on having a good estimate of the target 
distribution before the sampling and a tuning of the 
algorithm is not necessary. 

Equally important is the use of an ensemble of 
walkers instead of a single chain. As the emcee imple- 
mentation updates a large number of positions simul- 
taneously at every iteration of the process, it can be 
parallelised in order to take advantage of a compute 
cluster (see section]?]). 

The emcee sampler based on GW thus has the 
potential to improve upon the limitations outlined in 
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section [2j First, we expect the sampling process to 
be fairly robust to changes in data and model. Sec- 
ond, it is possible to distribute the sampling on the 
multiple nodes of modern cluster or cloud comput- 
ing environments. Both advantages can be achieved 
without tuning of the sampler or parallelisation of the 
underlying likelihood and theory codes. 

4. Implementation of CosmoHammer 

We have developed a Python framework called 
CosmoHammer for the estimation of cosmological pa- 
rameters. The software embeds the Python package 
emcee by Foreman-Mackey et al. ( 2012[ ) and gives 
the user the possibility to plug in modules for the 
computation of any desired likelihood. The major 
goal of the software is to reduce the complexity when 
one wants to extend or replace the existing compu- 
tation by modules which fit the user's needs as well 
as to provide the possibility to easily use large scale 
computing environments. 

4.1. Architecture 

We applied the principle of chaining modules for 
the computation of the likelihood as depicted in Fig- 
ure[T| The class diagram shows the underlying archi- 
tecture and the most important classes. The architec- 
ture allows for the development of self-contained and 
tested modules which can be assembled in different 
sampling configurations. The internal design sepa- 
rates parameter space exploration, theory prediction 
and likelihood computation. This makes it easy to 
extend or replace these modules by new algorithms. 

The concept of CosmoHammer divides the mod- 
ules in two logical groups: modules for the compu- 
tation of the likelihood and core modules. The core 
modules produce information like the CMB power 
spectrum or other theory predictions for the likeli- 
hood modules. The individual core modules can be 
combined in an instance of the LikelihoodComputa- 
tionChain module (see Figure [T]). The chain stores 
the modules and initialises them in the right order. 
Furthermore, it ensures that the sampled parameters 
stay within physically motivated bounds during the 
sampling process. 

The modules in the LikelihoodComputationChain 
communicate via the ChainContext in which arbitrary 



data can be stored and retrieved. This minimises the 
dependencies between the individual modules and 
ensures that they can be replaced without the need to 
change or extend CosmoHammer. 

4.2. Components 

CosmoHammer comes with a set of modules which 
compute the CMB power spectrum and the WMAP 
likelihood by wrapping the theory prediction code 
CAMB by [Lewis etU^pOOOl ) and the likelihood code 
from the WMAP team (ILarson et all 120111 IKomatsu 



et al.[ 2011 ). It integrates the two Fortran modules 



by wrapping them using numpys F2PY|^ The Fortran 
to Python interface generator provides the connec- 
tion between Python and Fortran. In this way we 
can take full advantage of the well tested and widely 
used Boltzmann integrator CAMB as well as of the 
WMAP likelihood computation module while com- 
bining them with the emcee sampling algorithm. Fur- 
thermore, by using the wrapped code we benefit from 
the performance of Fortran in the convenient Python 
environment. 

4.2.1. CAMB and WMAP modules 

To use CAMB and the WMAP code we create an 
instance of the LikelihoodComputationChain and add 
an instance of the CambCoreModule and CmbWmap- 
LikelihoodComputationModule . The CambCoreMod- 
ule delegates the computation of the power spectrum 
to the CambWrapperManager and the CmbWmap- 
LikelihoodComputationModule in turn delegates the 
likelihood computation to the WmapWrapperMan- 
ager. 

Alternatively an instance of the preconfigured Cm- 
bLikelihoodComputationChain can be created. This 
chain extends the regular LikelihoodComputation- 
Chain and ensures the correct setup of the two mod- 
ules. A detailed example is given in [Appendix B| on 
page 
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4.2.2. Samplers 

CosmoHammer also provides difl'erent samplers. 
The samplers embed the emcee package and are re- 
sponsible for logging and storing the obtained results. 
The CosmoHammerSampler is used when running 
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CosmoHammer::LikelihoodComputationChain 



-likelihoodModules 

-coreModules 

-mill 

-max 



+addLikelihoodModule(in module) 
+addCoreModule(in module) 
+isValid(in values) 
+setup() 
+ call (in p) 



CosmoHammer::CosmoHammerSampler 



+_init_() 
+startSampling() 



CosmoHammer::MpiCosmoHamnierSampler 



+mpiParallelizedMap(in function, in sequence) 
+mpiBCast(in value) 



CosmoHammer: :ChainContext 



•data 



+add(in key, in value) 
+get(in key) 
+remove(in key) 



mpi4py 



CosmoHammer: : ConcurrentMpiCosmoHammerSampler 



+getMap() 



multiprocessing 



CosmoHammer: :CmbLilielilioodComputationChain 



CosmoHaiTmier::CambCoreModule 



+ call (in chainContext) 



CosmoHammer: :CmbWmapExtLilceliiioodComputionModule 



+postProcessPowerSpectrum(in chainContext) 



Camb Wrapper: :CanibWrapperManager 



+setup() 

+computeCmbPowerSpectrum(in values) 



CosmoHammer: :CmbWmapLikelihoodComputionModule 



+computeLikelihood(in chainContext) 



WmapWrapper: : WmapWrapperManager 



+setup() 

+computeWmapLikelihood(in cl tt, in ci te, in cl ee, in cl bb) 



Figure 1: Class diagram of CosmoHammer's main architecture components. 



on a single physical computer using one or multi- 
ple threads. The functionality, however, is limited to 
a single computer. The extended MpiCosmoHamm- 
erSampler provides the required functionality when 
CosmoHammer should take advantage of a computa- 
tion cluster with multiple physical nodes like a cloud 
or grid computer. This sampler uses Message Passing 
Interface (MPI^for the communication between the 
nodes in the cluster. 

The implementation uses the paradigm of work- 
load partitioning in which the work is split into blocks 
of nearly equal length. Every node then processes its 
block and returns the results to the master node. The 



master node gathers all results and merges them into 
a single list which then is returned to emcee. 

When using a compute cluster the nodes often 
come with a large number of computational cores. 
Writing code that fully benefits from such a large 
number of cores is usually difficult. Therefore, it 
makes sense to split the workload also on the node 
since using a smaller number of cores per compu- 
tation while performing multiple computations in 



parallel is typically more efficient. In section 6.2 
it can be seen how the execution time decreases when 
CosmoHammer uses multiple processes on one physi- 
cal node. 

If it is desired to distribute the workload to sev- 
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eral nodes in a cluster as well as to spawn multiple 
processes on a node, the provided ConcurrentMpi- 
CosmoHammerSampler can be used. This sampler 
introduces another level of parallelisation by using 
Python's built in multiprocessing package. 

4.2.3. Initialisation 

By default all samplers initialise their walkers in 
a ball around a given center point as suggested in 
Foreman-Mackey et al. ( 2012| ). The starting position 
6q of every walker / is then computed as follows: 

where Pc is the center point and P^ is the start width. 
Both Pc and P^ have to be supplied by the user for 
every parameter. Furthermore, CosmoHammer comes 
with a built in generator producing a top-hat distribu- 
tion as follows: 

ff^ = Pc + e* P„, 

where e is a pseudorandom number ranging from - 1 
to 1. 

If one prefers to launch the sampling process with 
a different starting strategy, it is possible to pass a 
custom implementation of a PositionGenerator to the 
CosmoHammerSampler instance. 

5. Tests and results 

For testing the efficiency of CosmoHammer we 
compare it to CosmoMC, a widely used MCMC en- 



gine for cosmological parameter estimation by Lewis 
and Bridlel (|2002|). CosmoMC is a Fortran90 code 



which also uses CAMB and the WMAP likelihood code 
for estimating cosmological parameters from CMB 
data, but it employs the MH algorithm for creating its 
MCMC chains. It furthermore contains an extensive 
choice of additional datasets and analysis modules. 
To optimise the samphng process, [Lewis and Bri- 



dle] ( |2002[ ) use the following seven default parame- 
ters for describing the ACDM model: the physical 
baryon density Q-bh?-, the physical dark matter density 
Q.j}Mh^, the ratio of the approximate sound horizon 
to the angular diameter distance 9, the reionisation 
optical depth Tre, the scalar spectral index n^, the 
primordial superhorizon power in the curvature per- 
turbation on O.OSMpc'^ scales A^, and finally A^,, a 



Sunyaev-Zel'dovich template normalisation. Further- 
more, some of the parameters are rescaled to end up 
with similar orders of magnitude. 

As emcee is affine invariant, the particular choice 
of parametrisation is not expected to influence its 
performance. For simplicity, we decided to use the 
same ones as CosmoMC up to rescaling, yet replacing 
6 by Bubble's constant Hq. 

5.1. Analysis of MCMC chains 



Following the lines of jForeman-Mackey et al. 
( |2012| ) we adopt the autocorrelation time as our pri- 
mary quality criterion for a MCMC sampler. Consider 
a probability density function p{6), a MCMC sample 
{6,} of this distribution, and a function f{Q) whose 
mean (/) = f f(0)p(9) dO we wish to estimate. The 
autocorrelation C// of f(9) evaluated at the sample 
points {9t} with lag T is defined as: 

Cff(T) := am) - imm^r) - (/»). 



Typically, the autocorrelation of a MCMC sample is 
non-zero and decaying with increasing lag: Cff{T) oc 
exp(-;^). However, if the points {6',} were indepen- 
dent of each other, the autocorrelation function would 
vanish for all T > 1 . 

Let us also define the normalised autocorrelation 
function 

Cff{T) 

= CMO) • 

where C/y (0) is the variance of the sample {/,} := 
{f(9t)}. There are two relevant timescales connected 
to pff{T). On the one hand, there is the exponential 
autocorrelation time Texp which is defined by 



T„„ = lim sup 



T—^oo 



-log \pff{T)\' 



(1) 



On the other hand, the integrated autocorrelation time 
Tint is given by 



1 



-I- 



(2) 



T=l 



In general Texp and t,„, are not equivalent, although 
they are both equal to t// for exponentially decaying 
autocorrelation functions C//. 
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The reason why Tgvp and t,„, are relevant for the 
analysis of MCMC samples is connected to the issues 
of thermalisation at the beginning of a MCMC chain 
(often called burn-in) and the statistical errors one has 
to account for when evaluating the expectation value 
of f{6) using the sample {dt} (see the lecture notes by 
Sokal| ( [T989l ) for more information). 

The exponential autocorrelation time tells us how 
many iterations should be discarded at the beginning 
of the Markov chain in order to avoid an initialisa- 
tion bias, since it measures the time we have to wait 
for two positions in the sample {fi} to be close to 
uncorrelated. Discarding a few exponential autocorre- 
lation times at the beginning of the sampling typically 
suffices to suppress the bias. 

At the same time, the integrated autocorrelation 
time determines the standard error of the mean through: 



Var(/) = ^Var(/,), 



(3) 



where N is the size, / is the mean, and Var(/f) is the 
variance of the sample {/,}. 

An estimate for Cff(T) is given by 



CffiT) « CffiT) = -1- Y,{ft-mt.T-f). (4) 



N-T 



N-T 



t=i 



Estimating the integrated autocorrelation time from 
^ is not easy because of issues regarding the statis- 
tical noise in the large T limit of CffiT). Yet, the 
autocorrelation function for our problem is exponen- 
tially decaying, meaning that we can also evaluate 
Texp instead of T,„f. We find that estimating t„„ from 
a fit to Cff(T) is the best choice for evaluating the 
autocorrelation time of this particular problem (see 
Appendix C for more information). As Texp and r,,,, 
are equivalent for our purposes, we denote both as the 
autocorrelation time r for simplicity. 

5.2. Multiple walkers 

We typically demand that MCMC samples satisfy 
the following accuracy criterion: The statistical error 
of the mean / we wish to estimate — defined in equa- 
tion ([3]) — has to be smaller than a given fraction e of 
the standard deviation of its marginal distribution. In 
other words 



[ Var(/) 
Var(/) 



I2t 



(5) 



Table 1: Each emcee walker and CosmoMC chain is initialised at 
a random position around the mean, within a deviation sampled 
from a Gaussian distribution with given width. 



Pai'ameter 


Ho 






lO-'Aj 




Tie 


Asz 


e 


Mean 


70 


0.0226 


0.122 


2.1 


0.96 


0.09 


1 


1.04 


Width 


3 


0.001 


0.01 


0.1 


0.02 


0.03 


0.4 


0.002 


Lower bound 


40 


0.005 


0.01 


1.48 


0.5 


0.01 





.5 


Upper bound 


100 


0.1 


0.99 


5.45 


1.5 


0.8 


2 


10 



and consequently 



2t 

l2' 



(6) 



where A'^ is the total size of the sample. For multiple 
walkers or independent chains, is given by the num- 
ber of steps per walker or chain n (called sequential 
steps in the following) times the number of walkers L 
and hence 

(7) 

Equation ([5]) only holds when the sample is unbi- 
ased by the initialisation and this is true in the asymp- 
totic limit of large n. When sampling on a cloud 
computing infrastructure, however, we need a large 
number of walkers L for maximum parallelisability 
and hence expect rather small n according to (|7]). 

Yet, the sample gets close to unbiased when dis- 
carding the initial steps of every walker or chain as 



burn-in. We already mentioned in section 5.1 that 
such a burn-in phase is expected to last for a few au- 
tocorrelation times. When using only a few walkers, 
i.e. L close to one, n is large and the burn-in phase 
is a subdominant part of the overall sampling. As 
L grows, though, the number of sequential steps n 
becomes comparable to the burn-in length and the 
discarded samples turn into a dominant fraction of the 
overall sample size. We analyse the consequences of 
this result in section [ 



5.3. Configuration of CosmoMc and CosmoHammer 

In the following we want to compare CosmoHammer 
to two diff"erent configurations of CosmoMC. In each 
case we sample the likelihood given by the Fortran90 
code and the data of the WMAP 7 team. 

The first CosmoMC instance is a out-of-the-box ap- 
proach, using the standard configuration shipped with 
the CosmoMC package. It initialises the chain in a 
small ball around an estimated mean of the likelihood 



Table 2: Estimated autocorrelation time t for the seven parame- 
ters and the three sampler configurations. 



Sampler 


«o 








"i 




Asz 


CosmoHaiimier 


44.4 


44.7 


AAA 


43.4 


43.8 


43.8 


48.2 


fine-tuned CosmoMC 


17.3 


17.1 


15.1 


13.8 


17.6 


14.4 


18.4 


self-tuning CosmoMC 


16.6 


14.9 


19.1 


13.3 


15.0 


14.8 


16.3 



using the numbers from Table [T] and supplies a co- 
variance matrix to specify the Gaussian distribution 
which is used as the proposal. We will refer to this 
configuration diS fine-tuned CosmoMC. 

The second approach employs an option of CosmoMC 
which splits the sampling into two phases. Starting 
with an initial guess for the covariance matrix in the 
tuning phase — a diagonal matrix with estimated vari- 
ances in our case — the sampler continuously updates 
the proposal's covariance matrix from the last half of 
the generated samples. Afterwards, CosmoMC uses the 
generated proposal to create the samples for estimat- 
ing the likelihood. This approach will be called self- 
tuning CosmoMC from now on. When using multiple 
independent chains, CosmoMC allows to automatically 
stop the tuning phase once a convergence criterion is 
fulfilled. We use this option to ensure that the sam- 
pling after the tuning phase is comparable to the fine- 
tuned CosmoMC process. For our analyses, we used 10 
independent chains for both CosmoMC configurations. 

For CosmoHammer we call CAMB in exactly the 
same fashion as the standard CosmoMC configuration 
does (i.e. we use the same theoretical model for our 
cosmology) and also initialise the walkers according 
to Table [T] We furthermore used emcee with 350 
walkers, a rather arbitrary pick which was convenient 
to work with. 

The bounds listed in Table [T] are needed to ensure 
that the parameters which are passed to the Boltzmann 
integrator CAMB make sense physically. Whenever the 
sampler proposes a position which is out of bounds, 
CosmoHammer returns zero probability immediately. 

5.4. Efficiency 

A good estimate for the autocorrelation function 
is important for the analysis of the sample. For both 
CosmoMC and CosmoHammer the autocorrelations 




100 200 300 400 
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500 



Figure 2: Mean sequential steps % of CosmoHsmimer (top) and 
fine-tuned CosmoMC (bottom) for parameter A^. The shaded 
regions depict the error in the mean <j{X,) for every mean se- 
quential step. Self-tuning CosmoMC runs need about 2000 initial 
steps for tuning, but start at an unbiased position afterwards. 



with X being one of the seven dimensions of the 
parameter- space, behave equivalently for all parame- 
ters. As the estimation of the autocorrelation times for 
the difl'erent sampler configurations is not straightfor- 
ward, we give a detailed description of our procedure 
Appendix C The results can be found in Table [2j It 



1 



-T 



CxiT) = ) (X, - X){X,^T - X), 



m 

is not very surprising that the CosmoMC processes are 
more eflicient in terms of calls per independent sam- 
ple, as they are using a well tuned proposal to sample 
a target distribution which is itself close to normally 
distributed. On the other hand, emcee needed no tun- 
ing while still performing reasonably well in terms of 
autocorrelation time. 

Additionally we need to consider the optimal length 
of the burn-in period before the actual sampling starts. 
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As was discussed in section 5.2 this is particularly im- 
portant if we want to iterate a large number of walkers. 
Let {XD denote the position of walker i e {I, ■ ■ ■ , L} 
at iteration t e {I, - ■ ■ , n} for parameter X. For finding 
the burn-in length, we observed the mean sequential 
step Xt 

L 

x, = J]xl 

for increasing time t. Once the positions of the walk- 
ers are drawn from the target distribution in an unbi- 
ased way, we expect Xt to vary around the true mean 
as predicted by the standard error in the mean 



Table 3: Mean, standard deviation and the expected statistical 
error of the parameters sampled by Cosmo Hammer and CosmoMC. 



cr{X,) = 



cr 



where cr is the standard deviation of the marginal 
target distribution for parameter X. As long as this 
is not the case, the walkers are still biased by the 
initialisation. 

Using this method we find that a safe choice 
for the burn-in period is 250 for both the fine-tuned 
CosmoMC and CosmoHammer, as can be seen in Figure 
[2] for parameter A^. This is surprising as both pro- 
cesses have very diff'erent autocorrelation times, but 
the MCMC algorithm of emcee turns out to be very 
efficient in getting rid of its initialisation bias. The 
self-tuning algorithm needs about 2000 iterations to 
estimate its proposal distribution, but already starts 
at an unbiased position afterwards. The reason for 
such a long tuning phase is that the sampling effi- 
ciency is very poor when the sampler is untuned in 
the beginning. 

We can now consider the statistical error e from 
equation (|7]) as a function of burn-in or tuning length 
b, sequential steps n, number of walkers L, and auto- 
correlation time r: 



2r 



(n - b)L 



for n > b. 



(8) 



It is visualised in Figure [3] for all sampler configu- 
rations, using T from table [2] and the corresponding 
burn-in and tuning values. 

We find that the most efficient way to create a 
sample which satisfies the accuracy criterion ([5]) is to 
use a fine-tuned CosmoMC. This is not surprising, as it 



Parameter 


CosmoHammer 


CosmoMC 


Statistical error 


Ho 


70.4!^:^ 


70.2:|t 


+0.1 


IOOQm/i^ 


9 947+0.065 
^•^^'-0.051 


9 940+0.057 

•^-0.057 


+0.002 






1116+""'*** 


±0.0002 




9 1 74+0.078 

^■^ '^-0.069 


9 1^,0+0.077 


±0.003 




0.967:«;«]^ 


0.966!°;°|^ 


±0.0005 


100t„ 


8.7!|:^ 


8.6!|i 


±0.05 


Asz 




1.01-- 


±0.02 



has a good estimate of the target distribution before it 
even starts the sampling process, resulting in a short 
autocorrelation time and burn-in phase. Yet, a well 
tuned MH is typically not available when analysing 
new data. In this case, the self-tuning CosmoMC con- 
figuration or similar concepts with lengthy tuning 
phases have to be used for configuring the sampler. 

It is this tuning phase which turns out to be the 
bottle-neck for the parallelisation of a MH sampler. 
We can see in Figure [3] that for as few as 10 walkers, 
CosmoHammer reaches the 10% error regime e < 0.1 
before the self-tuning CosmoMC run even finalises the 
tuning atn = 2000. When estimating the parameter 
X from the target distribution, the result usually reads 

X = X + criX), 

with estimate X, sample mean X, and standard devi- 
ation (t{X). Consequently, the statistical error does 
not affect X up to the second digit of cr(X) when 6 
is smaller than 0. 1 and is hence sufliciently small for 
most applications in cosmological parameter estima- 
tion. 

5.5. Parameter estimation 

We know from the previous discussion that the 
errors we expect for our estimates of the mean are 
of the order a yflrjN, where cr is the standard de- 
viation of the target distribution for the respective 
parameter. The total sample size after the burn-in 
was chosen to be = 250 x 350 = 87'500 for 
CosmoHammer, predicting an accuracy in the mean 
estimate of about 3.4% relative to the standard de- 
viation. The same precision will be reached when 
using about N = 3'500 x 10 = 35'000 samples from 
CosmoMC. 
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Figure 3: Statistical error e, defined in equation ([8]), as a function of sequential steps n and for different numbers of walkers. For 
simplicity, we assume in this plot that burn-in length and autocorrelation time are independent of the number of walkers or chains. 
The shaded regions show where the respective configurations reach statistical error e = 10% and e - \%. 



As we chose A'^ such that the error is smaller than 
3.4% of the standard deviation, the parameter esti- 
mates of the different sampler configurations are sup- 
posed to vary on this scale, too. We can see from 
Table [3] that this is indeed the case. Finally, Figure |4] 
shows the projections of the 7-dimensional likelihood 
into one and two dimensional marginal distributions. 

We conclude that the samples of CosmoMC and 
CosmoHammer behave just as expected from our anal- 
ysis in section [5TT| In particular, this means that the 



quality of the sampling is well understood once the 
autocorrelation of the MCMC process is known. Tests 
based on multiple independent instances of the em- 
ployed sampler configurations support these conclu- 
sions. 

6. Performance measures and metrics 

The performed computations required a large amount 
of computational power. We therefore decided to 



explore the possible benefits of cloud computing by 
means of CosmoHammer. One of the major advantages 
of this computing strategy is that the configuration of 
the cloud can be easily tailored to the problem at hand. 
In the cloud more computational power can be added 
within minutes by renting extra compute instances 
on demand, resulting in an optimised execution time. 
The following section describes the environment and 
the configuration used to perform the benchmarks. 

6.1. Environment 

As cloud service provider we decided to use Ama- 
zon EC2 in combination with the Starcluster Toolkit 
(Software Tool for Academic and Researchers£| Ta- 
ble |4] shows the configuration of the instance types 
we used for the benchmarks. The high performance 
computing cluster consisted of one master node and 



'http : //star . mit . edu/cluster/ 
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Figure 4: One and two dimensional marginal distributions of the WMAP 7 likelihood as sampled by CosmoHammer (top) and 
CosmoMC (bottom) for all parameter combinations. 
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Table 4: Amazon EC2 instance types used for the benchmarks 





Master node 


Worker node 


Name 


Large Instance 


Cluster Compute Eight Extra Large 


API Name 


m I .large 


cc2.8xlarge 


Memory 


1,7 GB 


60,5 GB 


Instance storage 


160 GB 


3370 GB 


Processing Power 


2 Cores 


32 Cores 



several worker nodes. At the moment of the bench- 
marks one cc2.8xlarge Instance ships with 2 x Intel 
Xeon E5-2670, eight-core architecture with Hyper- 
Threading, resulting in 32 cores per node. We used 
a ml. large instance as master node mainly to benefit 
from the high I/O performance in order to reduce the 
loading time of the WMAP data. 

To compile the native Fortran modules, we used 
Intel's ifort compiler and mkl libraries. Python 2.7, 
and numpy 1.6.2. 

6.2. Benchmarks 

The results depicted in Figure [5]have been realised 
with one to 64 worker nodes (32 - 2048 cores) and 
different combinations of processes and threads per 
node. The processes define the number of computa- 
tions executed in parallel and the threads represent 
the number of cores used for one computation. In the 
case of 4 processes and 8 threads, for instance, there 
were four Python processes working in parallel on 
every node, each of them spawning eight threads. 

For all test runs, CosmoHammer was configured to 
use 350 walkers and to run 500 sampling iterations, 
resulting in 175'000 samples per run. This sample 
size was also used for the parameter estimation in 
section 5.5 We used a LikelihoodComputationChain 
with the CambCoreModule and the CmbWmapExt- 
LikelihoodComputationModule for the computation 
of the likelihood. 

As it can be seen in Figure[5] CosmoHammer scales 
almost linearly with increasing number of computa- 
tional cores. The best result was achieved using 64 
nodes with 32 cores, four processes and eight threads. 
Using this configuration, the computation took about 
16 Minutes. Reproducing these results with a desktop 
computer would take up to 75 hours. 



6.3. Discussion 

The implemented algorithm for the parallelisation 
is efficient yet easy to understand. Note, however, that 
this way of parallelisation is only beneficial when the 
executed computations are time and resource consum- 
ing. Distributing the workload in a compute cluster 
always implies the transfer of information over the 
network which is typically slower than transferring 
information between local processes by an order of 
magnitude. Therefore, the advantage of additional 
computing resources and the disadvantage of the net- 
work overhead have to be weighted. 

Since we use CosmoHammer for the estimation of 
cosmological parameters which implies the execution 
of computationally intensive theory prediction mod- 
ules like CAMB, the overhead of the network latency 
plays a secondary role. We expect that CosmoHammer 
will be extended by additional theory and likelihood 
modules so that the network overhead can be ne- 
glected and the number of computational nodes and 
therefore CPUs can be increased. Nevertheless, there 
is a given maximum: The number of nodes cannot 
exceed the number of walkers. 

CosmoHammer can not only take advantage of mul- 
tiple nodes in a cluster but can be parallelised also 
on every node. As illustrated in Figure [5} there is a 
maximum for the number of processes on the node, 
too. The test run using eight processes and four OMP 
threads reaches a plateau when using 64 nodes. This 
is caused by the workload partitioning where the 350 
walkers are equally distributed on 64 nodes. Conse- 
quently, more processes are available than walkers to 
process, which in turn causes the processes and their 
spawned threads to idle instead of being used for the 
computation of the power spectrum. 

7. Conclusion 

In cosmological applications, accelerating MCMC 
methods is crucial whenever the sampling has to be 
repeated numerous times and evaluating the target 
distribution is computationally expensive. Examin- 
ing the measurement systematics of an astrophysical 
observation in an iterative analysis, for example, re- 
quires sampling an extensive amount of target distri- 
butions, each depending on time intensive simulations. 
Whenever it is not possible to massively parallelise 
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Figure 5: Run time behaviour of CosmoHammer with changing number of cores using different parallelisation schemes. 



the likelihood code, starting a large number of short 
chains enables us to evolve them in parallel not only 
on a couple of local CPUs, but globally on a large 
scale computing infrastructure. 

We found that two characteristics are crucial for 
the performance of parallelised MCMC sampling: It 
has to be efficient in terms of autocorrelation time 
and — equally important — it has to sample efficiently 
without needing much overhead for tuning and equili- 
bration. It is this overhead which puts a fundamental 
lower bound on the run time of the sampling proce- 
dure and can limit the application of MCMC methods 
when a large number of target distributions has to be 



explored. The emcee sampler by Foreman-Mackey 



|et al.| (2012) turned out to be a good choice regarding 
those preconditions: It not only requires no tuning of 
the algorithm in general, but also avoids an initiali- 
sation bias very quickly while performing fairly well 
in terms of autocorrelation time when applied to the 
likelihood of the WMAP 7 observations. 

In order to exploit the advantages of emcee on 
large cloud computing configurations or similar in- 
frastructure, we introduced CosmoHammer, a Python 
framework for parallelised MCMC sampling. It en- 
abled us to explore arising computer science tech- 
nologies like elastic cloud computing for scientific 
applications. The elastic nature of Amazon EC2, for 
example, ensures that scientists and engineers do not 



have to wait in long queues to access shared clusters 
as the computational power can be increased within 
minutes by renting additional compute instances. We 
reduced the time for estimating cosmological parame- 
ters using the WMAP 7 likelihood from 30 hours on a 
desktop computer running a MH sampler to about 16 
minutes on a cluster with 2048 cores on Amazon EC2. 
Furthermore, CosmoHammer scales linearly with in- 
creasing number of cores, highlighting the efficiency 
of the parallelisation concept. 

The implementation is not limited to inferring the 
parameters of ACDM using CAMB and WMAP, but the 
application programming interface of CosmoHammer 
allows for the extension of its application to further 
cosmological probes and models using newly devel- 
oped and self-contained Python modules. 

In the appendix, we describe the installation of 
CosmoHammer and give a guide for the correct setup 
when using CAMB and the WMAP likelihood. 

Appendix A. Download and Installation 

The tarballs containing the most recent and sta- 
ble version of CosmoHammer and the wrapper mod- 
ules can be found at http : //www . astro . ethz . ch/j 
ref regier/research/Sof tware/ cosmohammerl 

CosmoHammer relies on the following Python pack- 
ages: 
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• emcee- affine invariant MCMC sampler 

• numpy - Numerical Python 

• mpi4py - Python wrapper for mpi (optional, 
only used if CosmoHammer is supposed to be 
distributed on multiple nodes) 

When additionally using the wrapped WMAP like- 
lihood module, the WMAP datc0 needs to be accessi- 
ble on the filesystem and CFITSIOj^has to be installed. 
We tested CosmoHammer with Python 2.6, Python 2.7, 
numpy 1.6.2, emcee 1.1.2, and mpi4py 1.3, but it is 
likely to work with earlier versions of these Python 
packages. For the compilation of the Fortran wrap- 
pers, a Fortran compiler and mkl libraries are required. 
The current distribution has only been tested with In- 
tel's ifort compiler and mkl libraries, though. 

To install the components the tarballs have to be 
extracted and the following commands have to be 
executed in the root directory of each module: 

°/o python setup. py build 

% sudo python setup. py install 

Every module comes with a Readme containing de- 
tailed information. 

Appendix B. Examples 

We show how to use CosmoHammer to estimate 
cosmological parameters with CAMB and WMAP like- 
lihood. The listed source code is also available in the 
distributed tarball. 

The import statements for both CosmoHammer and 
numpy have been omitted. We first define the initiali- 
sation by specifying the estimated start center, mini- 
mal and maximal value, and the start width using, for 
example, the values defined in Table [T] 

#parameter start center , min , max, 

start width 
params=np . array ([ [70 , 40, 100, 3], 

[0.0226 , 0.005 , 0.1 , 0.001] , 



^http : / /lambda . gsf c . nasa . gov/product/map/ 
|curr eiit/likelihood_get . cfm| 

|http : / /heasarc . gsf c . nasa . gov/docs/ software/ 
f itsio/f itsio .htmli 



[0.122, 0.01, 0.99, 0.01], 
[2.1e-9, 1.48e-9, 5.45e-9, le 
-10], 

[0.96, 0.5, 1.5, 0.02], 
[0.09, 0.01, 0.8, 0.03]]) 

After instantiating the chain and passing the min 
and max parameter boundaries as follows, the chain 
will check if the proposed walker positions are within 
the boundaries before calling the modules. 

chain = LikelihoodComputationChain( 
min=params [ : , 1 ] , 
max=params [ : , 2 ] ) 

We create an instance of the CambCoreModule and 
add it to the previously instantiated chain. At this 
point it is possible to add other modules, e.g. further 
theory prediction modules. 

camb=CambCoreModule ( ) 
chain . addCoreModule (camb ) 

Next, we have to instantiate the WMAP likelihood 
computation module and then add the module to the 
chain. Here, further implementations of likelihood 
modules can be added. 

wmapLikelihood = 

CmbWmapLikelihoodComputationModule 



chain . addLikelihoodModule ( 
wmapLikelihood ) 

Alternatively, we could create an instance of a Cm- 
bLikelihoodComputationChain which automatically 
adds the previous modules. 

chain = 

CmbLikelihoodComputationChain ( 
min=params [ : , 1 ] , 
max=params[: ,2]) 

By calling the setup( ) function the CAMB and WMAP 
modules are initialised. 

chain . setup () 

Finally, we create a CosmoHammerSampler and 
pass the arguments. A walkersRatio of 50 will launch 
50 X 6 = 300 walkers, where six is the number of 
parameters sampled. By calling the startSampling( ) 
function, CosmoHammer will first sample 250 itera- 
tions for burn-in and then run for another 250 itera- 
tions while writing the results to a file. 



s a mpler=CosmoHammerS ampler ( 
params=params , 
likelihoodComputationChain 

= chain , 
f i 1 e Pre fi X =" example " , 
walkersRatio=50, 
burninlterations =250, 
samplelterations =250) 

sampler . starts ampling () 

Further examples can be found in the distributed 
tarball. 

Appendix C. Estimation of autocorrelation 

As mentioned in section [5TT| we want to estimate 
the correlation function Cff(T) from a finite sample 
{df}. For parameter estimation, we are particularly 
interested in the case when f{9t) = 6t is the identity, 
as we want to estimate the mean of the marginals 
of our multidimensional posterior distribution (and 
maybe higher moments). For a single MCMC chain, 
the estimator for Cff{T) was introduced in equation 
(|4]) and for / being the identity, it reads 



C{T) = YiXt - X){X,^T - X), 

n-T ^ 

f=i 

where X is one of the seven parameters of the likeli- 
hood and the estimator of the identity's autocorrela- 
tion function is denoted by C{T). 

We now consider the case of an ensemble sampler 
with sample {^J},gl'.?^ where i is the walker and t 
the iteration. [Goodm an and We^ ( |2010| ) propose 
the following procedure for estimating C(r) of their 
ensemble sampler: Let F, = |; denote the 

ensemble average per iteration. The estimate Cf(T) 
of the correlation function of the process is then given 
by: 



n-T 



(T) = ViF, - F){F,,T -F). (C. 1) 



n-T 



In our analyses we found that large n are needed 
for a stable estimator Cf(T) using this procedure. 
However, altering the calculations slightly did im- 
prove the stability while yielding results consistent 
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Figure C.6: Autocon-elations of parameter QdmIt^ sampled 
by CosmoHammer, fine-tuned, and self-tuning CosmoMC with 
3000 sequential steps, estimated by CiT) (solid line) and Cp{T) 
(dashed line). The shaded regions depict the standard deviation 
of C{T) between the individual walkers or chains. 



with ( |C.1[ ). Instead of calculating the autocorrelation 
of the average steps, we calculated C{T) per walker 
and averaged the results afterwards: 

^^^^ =iYu-n^ ly^ - ^''^^^-^ - 

(' = 1 t= 1 

The diff"erence between the two approaches is visu- 
alised in Figure C.6 for parameter QoMh^- It shows 
that 3'000 steps are not sufficient for estimating the 
autocorrelation time reliably from a single chain, as 
the deviations between the diff"erent chains are still 
too big. Evaluating the autocorrelation time using the 



dashed line given by equation ( |C.1[ ) is hence not ex- 
pected to yield reliable results. Averaging over a large 
number of walkers suppresses the fluctuations and 
improves the stability of the estimated autocorrelation 
time. 

Even if the estimate for C{T) is stable, it is non- 
trivial to deduce r^^p and t,„, defined in equations 
([T]) and ([2]). We used the algorithm proposed by 
Jonathan Goodmanj^and implemented in Python by 
Dan Foreman-Macke>|^to estimate the integrated au- 
tocorrelation time. On the other hand, we evaluated 
the exponential autocorrelation time from a simple 
least-squares fit to C(T) up to the maximal T for 
which C{T) < e'^ holds. 



^ http : / /www . math . nyu . edu/ f aculty/goodman/ 


sof tware/acor/ 






https : //github . com/df m/acor 
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As you can see from Figure C.6[ the autocorre- 
lation function resembles an exponential decay. We 
hence expect the estimated values of T^^p and Tint 
to coincide. Indeed, both t„t„ and r,, 



> exp "XX.. . converge to 
the same value, but the exponential autocorrelation 
time estimate is quite stable already at about 3'000 
iterations per walker (using 350 walkers), while the 
integrated autocorrelation has not yet converged. We 
hence decided to use Texp for estimating the relevant 
time-scales for our sample analysis. 

Yet, it turns out that after analysing the autocor- 
relation time from B'OOO iterations per walker and 



350 walkers, we found in section [53] that only 250 
sequential steps are required for an error in the mean 
of 3.4% of the standard deviation. Hence, more sam- 
ples are needed to properly evaluate the error bars 
than for parameter estimation. Nevertheless, the es- 
timate for Texp is typically of the right order even for 
small numbers of sequential steps. Rounding the re- 
sult generously should hence suffice to get a rough 
upper bound on the error bars. 

For our comparison of the samplers, we decided 
to consider runs with a large number of sequential 
steps in order to get stable estimates for T^xp- We used 
10 independent chains with 50'000 iterations each for 
the CosmoMC runs and 3'000 iterations of 350 walkers 
for the CosmoHammer. The results are stated in Table 

El 
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